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Abstract 

Background: Psychometric properties include validity, reliability and sensitivity to change. Establishing the 
psychometric properties of an instrument which measures three-dimensional human posture are essential prior to 
applying it in clinical practice or research. 

Methods: This paper reports the findings of a systematic literature review which aimed to 1) identify non-invasive 
three-dimensional (3D) human posture-measuring instruments; and 2) assess the quality of reporting of the 
methodological procedures undertaken to establish their psychometric properties, using a purpose-build critical 
appraisal tool. 

Results: Seventeen instruments were identified, of which nine were supported by research into psychometric 
properties. Eleven and six papers respectively, reported on validity and reliability testing. Rater qualification and 
reference standards were generally poorly addressed, and there was variable quality reporting of rater blinding and 
statistical analysis. 

Conclusions: There is a lack of current research to establish the psychometric properties of non-invasive 3D 
human posture-measuring instruments. 
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Background 

Postural assessment is a standard and essential compo- 
nent of examining individuals with neuromusculoskele- 
tal disorders [1,2]. Prolonged static postures are widely 
recognised as a risk factor of neuromusculoskeletal pain 
among children, adolescents and adults [3-9]. No uni- 
form definition for "ideal" posture exists and therefore 
researchers and clinicians continue to seek the best way 
of assessing and describing posture. Ideal spinal posture 
is proposed as neutral spinal alignment, however the 
relationship between spinal segments in a normal popu- 
lation remains unknown [10,11]. The spine is a complex 
three-dimensional (3D) anatomical structure, whose seg- 
mental position in space should be described in all three 
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planes (sagittal, frontal and transverse) [12-14]. Precise 
positional data can be derived from a number of biome- 
chanical measurement tools, of which non-invasive 3D 
instruments are preferred. 

It is essential that a spinal posture-measuring instru- 
ment is shown to be reliable and valid. Without this 
assurance, it cannot facilitate diagnosis, chart variability 
in 'usual' posture or assist objective monitoring of 
patient progress with treatment [1]. Researchers and 
clinicians should therefore be familiar with the psycho- 
metric properties of spinal posture-measuring instru- 
ments, and choose the ones with the best evidence of 
performance [15]. 

Two core elements of psychometric properties are 
reliability and validity [16]. Reliability and validity are 
interlinked of which reliability is a prerequisite to valid- 
ity. A measurement tool cannot be recommended with 
confidence if there is a lack of evidence about its relia- 
bility and validity [17]. Reliability, refers to being able to 
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estimate the inherent variability of posture, as well as 
error that can be attributed to the rater and the mea- 
surement instrument [17]. Error can relate to the con- 
sistency with which measurements are taken by the 
same or different raters, or over multiple occasions of 
testing [16]. Reliability is variously classified as test-ret- 
est reliability, inter-and intra-rater reliability. Test-retest 
reliability describes the stability of the measurement 
instrument in obtaining the same results with repeated 
measurements using the identical test on two or more 
separate occasions, keeping all testing conditions as con- 
stant as possible [17]. Intra-rater reliability is defined as 
the stability of data recorded by one observer across two 
or more test occasions. Inter-rater reliability is the 
extent to which two or more observers obtain similar 
scores when rating the same individuals [16,17]. 

Validity is the extent to which an instrument measures 
what it is intended to measure [18]. Criterion-related 
validity is the ability of one test (index test) to predict 
results obtained on an external criterion (gold standard/ 
reference standard) which is assumed to be valid. When 
both tests are performed on the same subjects, the 
scores from the index test are correlated with those 
achieved by the criterion measure. Construct validity is 
the ability of an instrument to measure an abstract con- 
cept, which cannot be observed directly and which has 
been constructed to represent an abstract trait [17]. 
There are two types of criterion-related validity. Concur- 
rent validity is evaluated when the index test and the 
criterion measure are taken at the same time so that it 
reflects the same incident of behaviour while predictive 
validity is tested when the index test is performed and 
measured prospectively to ascertain the relationship 
between the index test and the criterion scores to deter- 
mine whether the index test is a valid predictor of the 
outcome [17]. There are three types of construct valid- 
ity. Convergent validity indicates that two measures, 
which are believed to reflect the same construct, will 
have similar results or will correlate highly [17]. 
Whereas divergent validity indicates that two measures, 
which are believed to measure different constructs, will 
correlate poorly [19]. Convergent and divergent validity 
assess the sensitivity and specificity of a measurement 
respectively [19]. Discriminative validity is the extent to 
which measures from a measurement instrument distin- 
guishes between individuals or populations that would 
be expected to differ [19]. 

Establishing the psychometric properties of spinal pos- 
ture-measuring instruments is not a trivial task, given 
the complex nature of human posture. Thus, convincing 
evidence of reliability and validity of any posture-mea- 
suring instrument can only be established by assessing 
the methodological quality of the underpinning develop- 
mental studies. Specific psychometric study design 



features are therefore essential to establish and assess, 
for instance, controls that are put in place for systematic 
bias, non-systematic bias and inferential error. An 
important requirement for psychometric testing of pos- 
ture measurement is that the instrument be tested 
under a given set of conditions on a specific population 
within the context of the instrument's intended use. 
Therefore it is essential that posture-measuring instru- 
ments be tested on humans at some stage of develop- 
ment, and not just on inanimate objects [17]. 

The purpose of the systematic review reported in this 
paper was 1) to identify the non-invasive 3D tools which 
measure human static sitting or standing spinal posture 
and 2) to review the quality of the evidence of reliability 
and validity of the identified 3D posture-measuring 
instruments. 

Methods 

Search Strategies 

Two inter-related search strategies (A and B) were 
implemented to ensure that all eligible papers were 
included. Strategy A sought any primary research stu- 
dies which reported the use of 3D non-invasive instru- 
ments measuring static sitting or standing spinal 
posture. Strategy B sought primary research into the 
psychometric testing of these instruments. One reviewer 
searched six electronic databases that were available at 
the Stellenbosch University Library. The databases were 
BioMed Central, CINAHL, PEDRO, PROQUEST, 
PUBMED and SCIENCE DIRECT. The publication date 
was restricted to papers published from 1980 to June 
2010. The search was limited to full-text papers pub- 
lished in English. MESH terms were used in PUBMED. 
See additional file 1 for a detailed description of the 
database searches. 

In addition, secondary searching was performed on 
the reference list of the included papers. Experts in this 
field of research, and authors who failed to provide 
references to studies which tested an instrument's psy- 
chometric properties, were contacted. 

Keywords and synonyms 

The following keywords were used: three-dimensional, 
measurement tool, assessment tool, instrument, mea- 
surement, assessment, spinal posture, posture, validity, 
reliability, accuracy and reproducibility. 

Inclusion and exclusion criteria for selection of papers 

Papers were included if they reported testing an instru- 
ment's psychometric properties, specifically reliability 
and/or validity, using humans, or the instrument's valid- 
ity using objects. A core inclusion criteria was that static 
standing or sitting spinal posture had to be evaluated 
with an instrument that could quantitatively calculate 
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3D spinal posture without using a baseline reference 
value such as zero. This was because a reference value 
requires that the subject be required to first assume a 
neutral or resting posture at which point the instrument 
is zeroed before the instrument can measure static 
spinal posture. For the purpose of the review, static pos- 
ture should be assessed instantaneously without any 
guiding from the researcher. 

Papers were excluded if (1) they reported neither relia- 
bility nor validity testing (2) they did not report on sta- 
tic spinal posture (e.g. reported on the 3D motion of the 
spine, scapulo-humeral girdle or pelvis); (3) the study 
reported on the validity testing of an instrument using 
motion (as motion was not incorporated in this review, 



and we argue that validity be evaluated within the con- 
text of the instrument's intended use; (4) the instrument 
only measured cadaver or in vitro spinal posture; (5) the 
instrument was invasive e.g. biplanar radiography and 
stereoradiography; (6) only an algorithm or a mathema- 
tical formula were reported. 

Study selection 

One reviewer excluded papers by screening all the titles 
and reading the abstracts after which two independent 
reviewers selected the eligible papers after reading the 
full text version of the remaining papers. Figure 1 
describes the procedures of study selection for each of 
the two search strategies. 





Search Strategy A 










Search papers that measured 3D 




static spinal posture 





Search Strategy B 



Check references for the 
psychometric properties of the 
instrument used 



Search papers that tested the 

validity / reliability of the 
instrument that measures 3D 
static spinal posture 



If instrument not 
referenced, the authors 
were contacted 



I 



Searched 
referenced papers 



Paper was included if article 
adhered to the in- and exclusion 
criteria 



If no response from 

author, 
instrument/article 
excluded 



Papers retrieved from 
authors were 
accepted only if 
papers adhered to the 
in- and exclusion 
criteria 



Reference papers were accepted if paper 
adhered to the in- and exclusion criteria 



Figure 1 A Flowchart to demonstrate the procedures for study selection 
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Methodological Quality Appraisal 

The full text eligible papers were then subjected to metho- 
dological critical appraisal. The Critical Appraisal Tool 
(CAT) applied in this review was purpose-built, in the 
absence of any other relevant CAT. It was adapted from 
the Quality Assessment of Diagnostic Accuracy Studies 
(QUADAS) [20] and the Quality Appraisal of Reliability 
Studies (QAREL) [21]. The purpose-built CAT has 13 
items, however its data is not designed to be reported as a 
composite quality score (see additional file 2). The CAT 
was designed to assess the impact of each individual item 
on the quality of the methodological procedures imple- 
mented in each paper. Prior to critical appraisal of the 
included articles, three papers were randomly selected and 
assessed independently by three reviewers using the pur- 
pose-built CAT. Disagreements were discussed to ensure 
that interpretation of the CAT items were consistent. 

Results 

Results from the search strategies 

One hundred and thirty possible papers were consid- 
ered, of which 30 papers were deemed to be eligible. 
Nine additional papers were identified after searching 
the reference lists of these papers. Two further papers 
were included after experts and authors had been con- 
tacted. Figure 2 provides a consort diagram to demon- 
strate the selection of papers. 

Volume of literature 

Eighteen instruments were identified from the two lit- 
erature searches, 15 from Search A, one from Search B 
and two from author contacts. The instruments are 
listed in the first column of Table 1, the papers addres- 
sing aim one appear in the second column and those 
addressing aim two are in the third column. Papers 
reporting these instruments, are identified by bold script 
if from strategy A, italics if from strategy B, normal 
script if from author search and with a * if from second- 
ary searching. The Automatic Scoliosis Analyser System 
(Auscan) (Italy), the Elite system (Italy), the Optotrak 
3020 (Canada), the Peak Motus (USA), the PosturePrint 
(Canada), the Qualysis Proreflex Motion Capture Unit 
system (Sweden), the Vicon 370 (England) and an 
Optoelectronic camera system (Canada) are optoelectro- 
nic analysis systems. The Fonar upright positional MRI 
(USA) uses magnetic resonance imaging. The INSPECK 
(Canada) is an optical 3D digitizer. The Lumbar Motion 
Monitor (LMM) (USA) is a electrogoniometer. The 
Metrecom (USA), the Articulated Arm for Computer- 
ized Surface Measurement (BACES) (Italy) and the 
Microscribe 3DX Digitizer (USA) are computerized elec- 
tromechanical 3D digitizers. Rasterstereography is a 
photogrammetric method based on triangulation. The 3 
Space Isotrak or Fastrak (USA) and the Electromagnetic 



Papers screened by title for both search strategies A and B 
N=9717 



Apply inclusion criteria to the title and exclude papers 
N=9355 



Screen abstracts and exclude papers 
N=98 



Apply inclusion criteria to the full text papers and exclude papers 
N=130 



Exclude duplicate papers 
N=104 



Full text papers reviewed and verified by reviewers 
N=30 



Secondary searching of eligible papers to include papers 
N=9 



Papers included after experts and authors had been contacted 
N=2 



Total papers included to 
address aim 1 
N=24 



Total papers included to 
address aim 2 
N=17 



Figure 2 Consort diagram to demonstrate the selection of 
papers 



tracking system (USA) are electromagnetic devices. The 
Zebris (Germany) is an ultrasound analysis system. 

Seventeen papers reported on reliability and/or validity 
of the included instruments and were thus assessed to 
address Aim two (see Table 1 third column). One paper 
by Smidt et al. [22] reported on both reliability and 
validity, and was therefore reviewed as if it was two 
separate papers, due to the nature of this review. Drerup 
et al. [23] tested a new algorithm for processing data 
presented in a previous paper [24]. These papers were 
reviewed as if they were one paper, because the previous 
paper reported on the study procedure in more detail 
whereas the latter paper discussed the latest improve- 
ment made on the data processing procedure. 

Aim of the reliability studies 

The aim of six studies was to test the reliability of a 3D 
instrument in assessing the spinal posture of humans 
[22,25-29]. 

Aim of the validity studies 

The aim of eleven studies was to test the validity of a 
3D posture instrument. Four studies [23,30-32] used 
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Table 1 Recent three-dimensional instruments used to measure static spinal posture 



Instrument 


Addresses Aim 1: Used to measure 
posture 


Addresses Aim 2: Reports on psychometric properties 


M 


□ALtb 


u usualao et al. zUUz [4IJ 






AUSCAN 


Negrini et al. 2007 [42] 






Electromagnetic tracking 
system 


Claus et al. 2009 [43] 






Elite optoelectronic system 


Lissoni et al. 2001 [44]; Naslund et al. 2005 






Inspek 




Pazos et al. 2005* [35]; Pazos et al. 2007 [27] 


2 


Lumber Motion Monitor 


Jang et al. 2007 [46] 






rUiMAn upngnt positional 
MRI 


iviori et ai. z\jvo [4/j, Largin et ai. zu\j/ 
[48]; Lafon et al. 2010 [49] 






Metrecom 


rranKim et ai. lyyj puj, biacK et ai. iyyo 
[51]; Gram et al. 1999 [52] 


Cmirit at -=i mm* mi. M nrrnn n + tI 1003* r^Qi 
bmiat et ai. iyyz [zz\, i\iorton et ai. iyyj poj 


Z 


Microscribe 3DX Digitizer 




warren et ai, zuud lzoj 


1 


Optoelectronic camera 
system 


Duong et al. 2009 [53] 






Optotrak 3020 


Rempel et al. 2007 [54] 






Peak Motus 


Straker et al. 2009 [55] 






Postureprint 




Normand et al. 2002 [37]; Harrison et al. 2007 [33]; Janik et al. 

2007 [34]; Normand et al. 2007 [26] 


■'! 


Qualysis Proreflex Motion 
Capture Unit system 


Grip et al. 2007 [56]; Neiva et al. 2009 [57] 






Rasterstereography 




Stokes et al. 1 988* [32]; Hackenberg et al 2003a [30]; Hackenberg 
2003b [31]; Drerup et al. 1994* [23] and 1996* [24] 


5 


3 Space Isotrack/Fastrak 


0' Sullivan et al. 2006* [58]; Caneiro et al. 
2010 [59]; Astfalck et al. 2010 [60] 


Pearcy et al. 1989* [36] 


1 


Vicon three-dimensiona 
kinematic system 


Levine et al. 1996 [61}; Szeto et al. 2005 
[9]; Skalli et al. 2006 [62] 


Whittle et al. 1997 [29] 


1 


Zebris CMS70P; Zebris 
CMS20 


Theisen et al. 2010 [63] 


Geldhof et al. 2007 [25] 


1 



N: number of papers addressing aim 2; Bold script: Papers from search A; Italic script: Papers from search B;*: Papers from secondary search; Normal script: Papers 
from author search 



human subjects to measure 3D spinal posture and to 
compare the results with those obtained from a refer- 
ence standard. The other seven studies either used man- 
nequins [33-35], wooden wedges [36], a steel frame [22], 
parallelograms [37] or other objects with known para- 
meters [38] to test the validity of an instrument that 
could be used to assess 3D spinal posture of humans in 
future. 

Study design for reliability and validity studies 

The type of reliability and validity tested, as well as the 
time interval for the reliability studies and the refer- 
ence standard for the validity studies, are reported in 
Table 2. 

Statistical analysis 

Table 3 summarizes the statistical procedures imple- 
mented in the reliability and validity studies. Comparing 



the findings in this table with the types of reliability and 
validity testing reported in Table 2, highlights the varia- 
bility in choice and application of statistical tests to 
assess the same constructs. 

Methodological Quality Appraisal 

Table 4 reports the findings from the critical appraisal 
of the papers, related to reliability and validity testing. 

Item 1: If human subjects were used, did the authors 
give a detailed description of the sample of subjects used 
to perform the (index) test? 

Nine papers [22,25-32] scored "yes" because a detailed 
description of the sample characteristics was stated. 
Drerup et al. [23] scored "no" as the authors did not 
mention how their subjects were recruited and merely 
stated that only scoliosis patients were included. Seven 
papers [22,33-38] scored "not applicable" because these 
studies used inanimate objects. 
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Table 2 The type and time interval for reliability studies and the type and reference standard for validity studies 



Author 


Type of reliability 


Time interval 


Type of validity 


Reference standard 


btoKes et ai [ \ yooj 


v A 


M /A 

N/A 


Criterion-related 
validity 


Stereoradiography 


Pearcy et al (1989) 


N/A 


N/A 


Concurrent validity 


Precision optical inclinometer 


jiiiiGL t?L ai i, i yyzj 


M /A 


M / A 
N/A 


Concurrent validity 


Not specified 




Intra- and interrater 
reliability 


On the same 
day 


M /A 
]/ A 


N 1 / A 

N/A 


Norton et al (I yy3j 


N/A 


N/A 


Concurrent validity 


Type measure or ruler 


Drerup et al (1 996) 


N/A 


N/A 


Criterion-related 
va idity 


Stereoradiography 


Normand et al (2002) 


N/A 


N/A 


Concurrent validity 


Not specified 


Hackenberg et al (2003a) 


N/A 


N/A 


Criterion-related 
va idity 


Stereoradiography 


nacKenoerg et ai [ZUudu) 










razos et al [ZuDd) 


N/A 


N/A 


Concurrent validity 


Coordinate measuring 
machine 


HsrrKon pt p\\ Of)f)~7\ snH l^nik - pt p\\ 

1 1 u 1 1 1 jU 1 1 CTL d\ \Z-\J\J / j d\ \\J JulllrX CTL □ 1 

(2007) 


N/A 


N/A 


Conn irrpnt valiHitv 


Not ^nprifipH 


Whittle et al (1997) 


ntrarater reliability 


On the same 
day 


N/A 


N/A 


Warren et al 2005 


Intrarater reliability 


One minute 


N/A 


N/A 


Geldhof et al (2007) 


Intrarater reliability 


One week 


N/A 


N/A 


Pazos et al (2007) 


Test retest reliability 


30 seconds 


N/A 


N/A 


Normand et al (2007) 


Intra- and interrater 
reliability 


One day 


N/A 


N/A 



N/A: Not Applicable 



Item 2: Did the authors clarify the qualification, or 
competence of the rater(s) who performed the (index) 
test? 

Eleven validity studies [22,23,30-38] and four reliability 
studies [25,27-29] scored "no". The qualifications of the 
operators of the instruments were not reported, as there 



was no description of their past experience with operat- 
ing these instruments. The reliability studies of Smidt et 
al. [22] and Normand et al. [26] scored "yes" as they sta- 
ted that the operators were "familiar and competent" in 
its use. 

Item 3: Was the reference standard explained? 



Table 3 Statistical procedures of the reliability and validity studies 


Author 


Statistical analysis 


Stokes et al (1988) 


• linear regression analysis and Pearson correlation coefficient * 


Pearcy et al (1989) 


• means; estimate of error, regression analysis and ICC 


Smidt et al (1992) 


• Dunnett's comparison test 


Norton et al (1 993) 


• Pearson product moment correlation coefficient ® and repeated measures t test 


Drerup et al (1996) and Hackenberg et 


• Root mean square (RMS) deviations of the surface curves from the radiographic curves 


al (2003a and b) 




Whittle et al (1997) 


• ICC and Pearson correlation coefficient 


Normand et al (2002) 


• means, SD, SEM, 95% Confidence Intervals (CI) and mean differences 


Pazos et al (2005) 


• multiway ANOVA 


Warren et a I 2005 


• Pearson correlation coefficient and ICC 


Harrison et al (2007) and Janik et al 


• error analyses of mean differences and SD 


(2007) 




Geldhof et al (2007) 


• ICC for test-retest reliability 


Pazos et al (2007) 


• bivariate ANOVA; typical error of measurement (TEM); 95% CI of the TEM; smallest detectable difference 




(SDD) and multivariate ANOVA 


Normand et al (2007) 


• mean absolute values of differences within examiner and between examiner measurements; ANOVA; 




Shapiro-Wilk test and SEM for conservative and liberal ICC methods 
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Table 4 Summary of the methodological quality appraisal results of the studies (n = 17) 



Authors 


Item 1 


Item 2 


Item 3 


Item 4 


Item 5 


Item 6 


Item 7 


Item 8 


Item 9 


Item 10 


Item 1 1 


Item 12 


Item 13 


Stokes et al (1 988] 


/ 

V 


X 


V 


n/a 


n/a 


n/a 


/ 

V 


n/a 


/ 

V 


V 


/ 

V 


V 


/ 

V 


Pearcy et al (1 989) 


n/a 


X 


V 


n/a 


n/a 


n/a 


n/a 


n/a 


/ 

V 


V 


V 


n/a 


/ 

V 


Smidt et al (I99z) (validity) 


n/a 


X 


X 


n/a 


n/a 


n/a 


n/a 


n/a 


X 




X 


n/a 


/ 
V 


Smidt et al (I99z) (reliability) 


V 




n/a 


/ 

V 


V 


X 


n/a 


V 


n/a 




n/a 


X 


/ 
V 


Norton et al (1993) 


n/a 


X 


X 


n/a 


n/a 


n/a 


n/a 


n/a 


i 

V 




/ 

V 


n/a 


X 


Urerup et al (1994; 1996) 


X 


X 


V 


n/a 


n/a 


n/a 


/ 

V 


n/a 


/ 

V 




/ 

V 




/ 

V 


\ A /U :-t-t-i „4- -,1 / ~\ r\r\~7\ 

Whittle et al (1 99/) 


V 


X 


n/a 


n/a 


X 


X 


n/a 


/ 

V 


n/a 




n/a 


/ 

V 


/ 
V 


Normand et al (zOOzj 


n/a 


X 


X 


n/a 


n/a 


n/a 


n/a 


n/a 


X 




X 


n/a 


/ 
V 


Hackenberg et al (zOOiaj 


V 


X 


V 


n/a 


n/a 


n/a 


/ 

V 


n/a 


i 

V 


x 


/ 

V 


X 


/ 
V 


ndLhVtrl 1 Ufcrl Lj trL dl \Z.\J\JDUj 


,/ 

V 




J 

V 


n/a 


n/a 


n/a 


V 


n/a 


J 

V 




V 




a/ 
V 


Warren et al (2005) 


V 


X 


n/a 


n/a 


X 


X 


n/a 


V 


n/a 


V 


n/a 


X 


V 


Pazos et al. (2005) 


n/a 


X 


V 


n/a 


n/a 


n/a 


n/a 


n/a 


V 


V 


V 


n/a 


V 


Harrison et al (2007) 


n/a 


X 


x 


n/a 


n/a 


n/a 


n/a 


n/a 


x 


V 


X 


n/a 


V 


Janik et al (2007) 


n/a 


X 


x 


n/a 


n/a 


n/a 


n/a 


n/a 


x 


V 


X 


n/a 


V 


Geldhof et al (2007) 


V 


X 


n/a 


n/a 


V 


X 


n/a 


V 


n/a 


V 


n/a 


V 


V 


Pazos et al (2007) 


V 


X 


n/a 


n/a 


n/a 


n/a 


n/a 


V 


n/a 


V 


n/a 


X 


V 


Normand et al (2007) 


V 


V 


n/a 


V 


V 


V 


n/a 


V 


n/a 


V 


n/a 


V 


V 



Drerup et al. [23], Hackenberg et al. [30,31] and 
Stokes et al. [32] scored "yes" as they provided refer- 
ences for the methods used to digitize the radiographs. 
Pazos et al. [35] and Pearcy et al. [36] scored "yes" 
because the authors named and stated the accuracy of 
the instruments used as the reference standard. Norton 
et al. [38] scored "no" because the ruler or tape measure 
was inappropriately used as a reference standard for cal- 
culating 3D coordinates of a point in space. Harrison et 
al. [33], Janik et al. [34], Normand et al. [37] and Smidt 
et al. [22] scored "no" because the authors used an 
object with known 3D parameters as reference stan- 
dards, but the methods to measure these 3D locations, 
angles or distances were not explained. 

Item 4: If interrater reliability were tested, were raters 
blinded to the findings of other raters? 

Normand et al. [26] and Smidt et al. [22] scored "yes" 
because subjects were evaluated separately by the differ- 
ent raters. Geldhof et al. [25], Warren et al. [28] and 
Whittle and Levine [29] only tested intrarater reliability 
and scored "not applicable". Pazos et al. [26] scored "not 
applicable" because no rater reliability was evaluated but 
instead test-retest reliability of the instrument, when 
using different postures, was evaluated. 

Item 5: If intrarater reliability were tested, were raters 
blinded to their own prior findings of the test under 
evaluation? 

Geldhof et al. [25], Normand et al. [26] and Smidt et al. 
[22] scored "yes" because the raters were sufficiently 
blinded to their own prior measurements as either 
repeated digitizing of the anatomical landmarks took place 



one week apart, all photographs were numbered and were 
not identifiable by subject name, occasion or characteris- 
tics, and no skin markings were made on subjects. Warren 
et al. [28] and Whittle and Levine [29] scored "no" because 
passive and skin markings respectively were placed only 
once on the subject and were not removed between 
repeated measurements. Pazos et al. [27] scored "not 
applicable" because they did not test rater reliability. 

Item 6: Was the order of examination varied? 

Normand et al. [26] scored "yes" because subjects 
were evaluated in random order. Warren et al. [28] and 
Whittle and Levine [29] scored "no" because repeated 
measurements were performed consecutively without 
changing the order of subjects during testing. Geldhof et 
al. [25] scored "no" as the order of testing was kept the 
same for the repeated measurements one week apart. 
Smidt et al. [22] scored "no" as insufficient information 
was provided. Pazos et al. [27] scored "not applicable" 
because no rater reliability was tested. 

Item 7: If human subjects were used, was the time per- 
iod between the reference standard and the index test 
short enough to be reasonably sure that the target condi- 
tion did not change between the two tests? 

Drerup et al. [23], Hackenberg et al. [30,31] and 
Stokes et al. [32] scored "yes" because the radiographs 
and the rasterstereographs were taken on the same day. 
The other seven articles [22,33-38] scored "not applic- 
able" because inanimate objects which cannot deform 
with passage of time were used. 

Item 8: Was the stability (or theoretical stability) of the 
variable being measured taken into account when 
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determining the suitability of the time-interval between 
repeated measures? 

Six papers scored "yes" because repeated measure- 
ments of posture were either taken on the same day 
[22,27-29] one week [25] or one day apart [26]. 

Item 9: Was the reference standard independent of the 
index test? 

Seven papers [23,30-32,35,36,38] scored "yes" because 
the index test and the reference standard were indepen- 
dant instruments. Harrison et al. [33], Janik et al. [34], 
Normand et al. [37] and Smidt et al. [22] scored "no" 
due to insufficient information provided. 

Item 10: Was the execution of the (index) test described 
in sufficient detail to permit replication of the test? 

Nine validity [22,23,32-38] and six reliability papers 
[22,25-29] scored "yes" because clear descriptions of 
how the instruments were applied to the subjects or to 
the inanimate objects were provided. Hackenberg et al. 
[30,31] scored "no" as the authors did not explain how 
raterstereographs were performed on the subjects, nor 
did they provide any citations for the methodology. 

Item 11: Was the execution of the reference standard 
described in sufficient detail to permit its replication? 

Seven papers scored "yes" because clear descriptions of 
how the reference standard were used on the subjects 
[23,32] or on the inanimate objects [35,36,38] or citations 
for the methodology [30,31] were provided. Harrison et 
al. [33], Janik et al. [34], Smidt et al. [22] and Normand et 
al. [37] scored "no" for the reasoning provided for item 3. 

Item 12: Were withdrawals from the study explained? 

Drerup et al. [23], Geldhof et al. [25], Normand et al. 
[26], Stokes et al. [32] and Whittle and Levine [29], 
scored "yes" because the number of subjects who parti- 
cipated in the studies was reflected in the results sec- 
tions of the studies. Hackenberg et al. [30,31] scored 
"no" as the authors did not explain why 48 instead of 52 
and 24 instead of 25 subjects participated in the pre 
operative evaluations respectively. Pazos et al. [27], War- 
ren et al. [28] and Smidt et al. [22] scored "no" due to 
insufficient information provided. Seven papers 
[22,33-38] scored "not applicable" because these studies 
used inanimate objects. 

Item 13: Were the statistical methods appropriate for 
the purpose of the study? 

All but one paper by Norton et al. [38] implemented 
appropriate statistical analysis and thus scored "no". 
Although the other sixteen papers reported appropriate 
statistical analysis only six papers [23,30,31,26,28] pro- 
vided a justification or motivation for using their chosen 
statistical measures. 

Discussion 

This review attempted to evaluate the quality of report- 
ing of psychometric properties of 18 3D human posture 



measuring instruments. It identified a lack of well-docu- 
mented studies testing the psychometric properties of 
these instruments, as papers describing the development 
of only eight instruments were found (see Table 1 col- 
umn C). The review suggests that the PosturePrint and 
rasterstereography had relatively more psychometric 
testing than the other tools included in this review. 
However, the methodological quality of the testing pro- 
cedures for all instruments was flawed, when consider- 
ing the methodological criteria applied in this review. 

Rater qualification 

Both reliability and validity studies should provide 
descriptions of the qualifications of the rater(s) used in 
the studies because the rater(s) professional background, 
expertise and prior training operating these instruments 
will affect psychometric property assessment. Appropri- 
ate training of raters is important to minimise measure- 
ment error, and to facilitate interpretation of findings. 
These factors should therefore be considered when 
interpreting study findings, and extrapolating them for 
applicability and generalisability to other clinical and 
research settings [39]. 

Reference standard 

Four studies, which used inanimate objects, did not iden- 
tify the instruments used to obtain the known values of 
objects which provided the reference standard data. In 
order to test validity, it is important that the psycho- 
metric properties of the reference standard be known to 
confirm that the reference standard is suitable [39] . The 
most suitable non-invasive 3D reference standard for 
postural measurements has not been unanimously deter- 
mined in this field of research. The validity studies that 
used humans also used stereoradiography as reference 
standard, as radiography remains the most accurate 
assessment for posture. This situation continues, even 
though there is a possible health risk for repeated X-ray 
exposure to healthy spines and organs [40] . 

Norton et al. [38] used a ruler or tape measure as a 
reference standard. The x, y, z coordinates obtained 
from the index test had to be mathematically trans- 
formed to distances between pairs of points before the 
reference data, obtained from the ruler or tape measure, 
could be used. It would have been better had these 
authors used a reference standard with known accuracy 
to measure 3D coordinates directly. The ruler or tape 
measure was also a poor reference standard to use when 
measuring the distance between pairs of points on the 
human skeleton. 

Blinding for intra- or interrater reliability 

The repeated measurements by Geldhof et al. [25] were 
performed one week apart however the order of the 
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subjects was fixed. Therefore this enhances the possibi- 
lity for the raters to recall the test outcomes of the pre- 
vious measurements and potentially incurs increased 
bias. Warren et al. [28] and Whittle and Levine [29] 
tested intrarater reliability however the marking of the 
anatomical landmarks was only undertaken once before 
repeated measurements were taken, without allowing for 
removal and replacement of the markers between 
repeated measurements. Both raters in these studies 
were not blinded to their previous measurements of the 
same subjects. Consequently this potentially introduced 
bias and compromised the quality of the studies and 
findings. 

Statistical analysis 

Given the complexity of posture measurement and 
interpretation, no statistical strategy for psychometric 
property testing is without its disadvantages. Therefore 
it seems sensible to report the findings of two or more 
different statistical analysis approaches in order to vali- 
date findings [21]. This did not occur in any of the 
included papers. For example Pearcy et al. [36] used 
linear regression analysis to demonstrate that as the 
magnitude of the one variable increases so does the 
amount of error however there is no indication of a 
cut off value (e.g. 95% CI and SD) up to where the 3 
Space Isotrak can be expected to accurately measure 
an angle. 

As a variety of statistical measures were reported in 
this review, another method to improve reporting qual- 
ity would be for authors to justify why they chose a par- 
ticular statisical test, relevant to the purpose of testing. 
This would provide the reader with better insight into 
the results, and would perhaps guide future authors in 
choice, and interpretation of more appropriate statisical 
analysis. For example Norton et al. [38] used multiple 
analysis to determine whether there is agreement 
between measures. However Pearson product moment 
correlation only reports on the correlation between two 
different measurements and cannot quantify the amount 
of aggreement or indicate whether there is systematic 
error. Repeated t-tests are also inappropriate to test sys- 
tematic differences, as this testing will inflate the type I 
error and compromise interpretation of significance. 

Limitations 

One limitation to this review comes from our inability 
to retrieve potentially eligible papers from authors who 
failed to respond to email inquiries. It could be that 
there are other relevant instruments which have been 
adequately evaluated for reliability and validity, how- 
ever these papers were not available despite using mul- 
tiple search methods (database, internet and author 
searches). 



Conclusions 

This review described 18 non-invasive ways of measur- 
ing static human 3D sitting or standing spinal posture, 
and the methodological procedures of testing reliability 
and validity of a subset of these instruments. The review 
concludes that further research into the reliability and 
validity testing of these instruments is required to 
improve the quality of reliability and validity evidence of 
posture-measuring instruments. Psychometric property 
testing should be improved by addressing rater qualifica- 
tion, clearer definitions of the reference standards, 
applying appropriate methodological procedures to 
enhance rater blinding and improving the quality of 
reported statistical analysis. By improving the methodo- 
logical rigor of reliability and validity testing, it would 
consequently enhance users' confidence in the psycho- 
metric evidence of static human 3D sitting or standing 
spinal posture in clinical and research settings. 
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